A duration modeling technique with incremental speech rate normalization
نویسندگان
چکیده
This paper describes a novel technique to exploit duration information for low resource speech recognition systems. Using explicit duration models significantly increases computational cost due to a large search space. To avoid this problem, most of techniques using duration information adopt two-pass and N-best re-scoring approaches. Meanwhile, we propose an algorithm using word duration models with incremental speech rate normalization for the one-pass decoding approach. In the proposed technique, penalties are only added to scores of words with outlier durations, and not all words need to have duration models. Experimental results show that the proposed technique reduces up to 17% of errors on in-car digit string tasks without significant increase in computational cost.
منابع مشابه
On use of duration modeling for continuous digits speech recognition
In this paper, we describe our duration model techniques in HMM based speech recognizer. With this approach, a large amount of deletion and insertion errors can be reduced in Mandarin continuous digits recognizer. We address a simple duration penalty function, which can be explicitly combined into Viterbi-Beam search with negligible incremental computation overload. Different parametric distrib...
متن کاملSpeaking rate normalization with lattice-based context-dependent phoneme duration modeling for personalized speech recognizers on mobile devices
Voice access of cloud applications including social networks using mobile devices becomes attractive today. And personalized speech recognizers over mobile devices become feasible because most mobile devices have only a single user. Speaking rate variation is known to be an important source of performance degradation for spontaneous speech recognition. Speaking rate is speaker dependent, it cha...
متن کاملA study of tones and tempo in continuous Mandarin digit strings and their application in telephone quality speech recognition
Prosodic cues (namely, fundamental frequency, energy and duration) provide important information for speech. For a tonal language such as Chinese, fundamental frequency (F0) plays a critical role in characterizing tone as well, which is an essential phonemic feature. In this paper, we describe our work on duration and tone modeling for telephone-quality continuous Mandarin digits, and the appli...
متن کاملSpeech Rate Normalization Used to Improve Speaker Verification
A novel approach to speech rate normalization is presented. Models are constructed to model the way in which speech rate variation of a specific speaker influences the duration of phonemes. The models are evaluated in two ways. Firstly, the mean square error in phoneme duration based on our normalization is compared to the same error when such normalization is not applied. The second evaluation...
متن کاملSpeaker independent acoustic modeling using speaker normalization
This paper proposes a novel speaker-independent (SI) modeling for spontaneous speech data from multiple speakers. The SI acoustic model parameters are estimated by individual training for inter-speaker variability and for intraspeaker phonetically related variation in order to obtain a more accurate acoustic model. The linear transformation technique is used for the speaker normalization to ext...
متن کامل